Spacecraft database is in excel format. Mainly empty columns and columns that contain insignificant data (for the current case) have been removed. Database contains information about spacecraft orbital parameteres, mass, country of origin, launch site, purpose, etc.
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn import preprocessing
from collections import Counter
%matplotlib inline
df = pd.read_excel('UCS-Satellite-Database-8-1-2020.xls')
df.head()
| Name | Country | Users | Purpose | Class of Orbit | Longitude of GEO | Perigee | Apogee | Eccentricity | Inclination | Period | Launch Mass | Launch Site | Launch Vehicle | Year of Launch | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1HOPSAT | USA | Commercial | Earth Observation | LEO | 0.0 | 566 | 576 | 0.000720 | 36.90 | 96.08 | 22.0 | Satish Dhawan Space Centre | PSLV | 2019 |
| 1 | 3Cat-1 | Spain | Civil | Technology Development | LEO | 0.0 | 476 | 500 | 0.001750 | 97.40 | 95.00 | 4.0 | Satish Dhawan Space Centre | PSLV | 2018 |
| 2 | Aalto-1 | Finland | Civil | Technology Development | LEO | 0.0 | 497 | 517 | 0.001454 | 97.45 | 94.70 | 4.5 | Satish Dhawan Space Centre | PSLV | 2017 |
| 3 | AAUSat-4 | Denmark | Civil | Earth Observation | LEO | 0.0 | 442 | 687 | 0.017665 | 98.20 | 95.90 | 1.0 | Guiana Space Center | Soyuz 2.1a | 2016 |
| 4 | ABS-2 | Multinational | Commercial | Communications | GEO | 75.0 | 35778 | 35793 | 0.000178 | 0.08 | 1436.03 | 6330.0 | Guiana Space Center | Ariane 5 ECA | 2014 |
sc_countries = df['Country'].value_counts()
sc_countries.head(20)
USA 1406 China 375 Russia 170 United Kingdom 129 Japan 80 Multinational 64 India 58 ESA 53 Canada 39 Germany 33 Luxembourg 32 Spain 21 South Korea 17 Argentina 16 Israel 16 Australia 13 Saudi Arabia 13 France 12 Netherlands 12 Italy 12 Name: Country, dtype: int64
sc_countries.head(20).plot.bar(color='#483D8B', figsize=(15,5), xlabel='Country',
ylabel='Number of spacecraft', title='Top spacecraft operating countries')
<AxesSubplot:title={'center':'Top spacecraft operating countries'}, xlabel='Country', ylabel='Number of spacecraft'>
sc_purpose = df['Purpose'].value_counts()
sc_purpose.head(10)
Communications 1364 Earth Observation 785 Technology Development 323 Navigation/Global Positioning 138 Space Science 88 Earth Science 13 Navigation/Regional Positioning 12 Technology Demonstration 9 Communications/Technology Development 8 Space Observation 8 Name: Purpose, dtype: int64
sc_purpose.head(6).plot.pie(figsize=(10, 10), autopct='%1.1f%%', title='Satellite prime purpose',
colors=['#DDA5B6', '#F2CC8C', '#F1E6C1', '#3F6A8A', '#4D5E72', '#8FB9A8']);
sc_launchyear = df['Year of Launch'].value_counts()
sc_launchyear.plot(style='d', figsize=(10, 8), title='Launched by year', xlabel='Year', ylabel='Spacecraft launched')
<AxesSubplot:title={'center':'Launched by year'}, xlabel='Year', ylabel='Spacecraft launched'>
sc_launchsite = df['Launch Site'].value_counts()
sc_launchsite.head(15)
Cape Canaveral 778 Baikonur Cosmodrome 337 Guiana Space Center 276 Satish Dhawan Space Centre 273 Vandenberg AFB 242 Jiuquan Satellite Launch Center 154 Xichang Satellite Launch Center 124 Taiyuan Launch Center 117 Plesetsk Cosmodrome 114 Vostochny Cosmodrome 64 Dombarovsky Air Base 45 Rocket Lab Launch Complex 1 45 International Space Station 41 Tanegashima Space Center 41 Wallops Island Flight Facility 33 Name: Launch Site, dtype: int64
sc_launchsite.head(15).plot.bar(color='#8FB9A8', figsize=(15,5), xlabel='Launch Site',
ylabel='Number of launches', title='Spacecraft launching sites')
<AxesSubplot:title={'center':'Spacecraft launching sites'}, xlabel='Launch Site', ylabel='Number of launches'>
sc_users = df['Users'].value_counts()
sc_users.head(10)
Commercial 1514 Government 452 Military 357 Civil 140 Government/Commercial 114 Military/Commercial 77 Military/Government 57 Government/Civil 43 Commercial/Civil 11 Military/Civil 7 Name: Users, dtype: int64
sc_users.head(4).plot.pie(figsize=(10, 10), autopct='%1.1f%%', title='Satellite main users',
colors=['#DDA5B6', '#F2CC8C', '#F1E6C1', '#3F6A8A']);
Let's try to cluster spacecrafts by orbit class using perigee and eccentricity (yes, we already know orbit class thanks to 'Class of orbit' column, but let's do it anyway, just to see if it is possible).
df['Class of Orbit'].value_counts()
LEO 2032 GEO 560 MEO 137 Elliptical 58 Name: Class of Orbit, dtype: int64
Well, there are four types of orbits in our dataset, so there shoud be 4 clasters.
X = df[['Perigee', 'Eccentricity']]
fig = px.scatter(df, x='Perigee', y='Eccentricity')
fig.show()
kmeans = KMeans(n_clusters=4, random_state=13)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
plt.rcParams['figure.figsize'] = [15, 10]
plt.scatter(df['Perigee'], df['Eccentricity'], c=y_kmeans, s=50, cmap='viridis')
plt.xlabel('Perigee')
plt.ylabel('Eccentricity');
It does not look right at all. Let's try to scale data.
X_norm = preprocessing.normalize(X, axis=0, norm='max')
kmeans_n = KMeans(n_clusters=4, random_state=13)
kmeans_n.fit(X_norm)
y_kmeans_n = kmeans_n.predict(X_norm)
plt.rcParams['figure.figsize'] = [15, 10]
plt.scatter(df['Perigee'], df['Eccentricity'], c=y_kmeans_n, s=50, cmap='viridis')
plt.xlabel('Perigee')
plt.ylabel('Eccentricity');
So much better! Let's compare our results with dataset.
Counter(y_kmeans_n) # K-means results
Counter({0: 2063, 1: 562, 3: 118, 2: 44})
df['Class of Orbit'].value_counts()
LEO 2032 GEO 560 MEO 137 Elliptical 58 Name: Class of Orbit, dtype: int64
Not perfect but pretty close.
df_labels = pd.factorize(df['Class of Orbit'])[0] # set classes of orbits to labels
plt.rcParams['figure.figsize'] = [15, 10]
plt.scatter(df['Perigee'], df['Eccentricity'], c=df_labels, s=50, cmap='viridis')
plt.xlabel('Perigee')
plt.ylabel('Eccentricity');
I'm going to add some features and see if it will improve clustering.
X_aug = df[['Perigee', 'Eccentricity', 'Apogee', 'Period']]
X_aug_norm = preprocessing.normalize(X_aug, axis=0, norm='max')
kmeans_n_aug = KMeans(n_clusters=4, random_state=13)
kmeans_n_aug.fit(X_aug_norm)
y_kmeans_n_aug = kmeans_n_aug.predict(X_aug_norm)
plt.rcParams['figure.figsize'] = [15, 10]
plt.scatter(df['Perigee'], df['Eccentricity'], c=y_kmeans_n_aug, s=50, cmap='viridis')
plt.xlabel('Perigee')
plt.ylabel('Eccentricity');
Counter(y_kmeans_n_aug) # K-means results
Counter({0: 2063, 1: 562, 3: 118, 2: 44})
I guess no, the result is the same.